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What is claimed is: 

A file content classification system comprising: 
a digital ID generator; 

an ID appearance database coupled to receive IDs from the ID 
generator; and 

a characteristic comparison routine identifying the file as having a 
characteristic based on ID appearance in the appearance database. 

2. The content classification system of claim 1 wherein said ID 
generator comprises a hashing algorithm. 

3. The content classification system of claim 2 wherein said hashing 
algorithm is the MD5 hashihg algorithm. 

4. The content classification system of claim 1 wherein said ID 
appearance database tracks the frequency of appearance of a digital ID. 




The content classification system of claim 1 further including a 
y of digital ID generators on different systems all coupled to and 
g IDs to said ID appearance databas 



6. The content classification system of clairriS wherein said plurality 
of digital ID generators are coupled to said database\yia a combination of 
public and private networks. 

7. The content classification system of claim 6 wherein sdid database 
is coupled to an intermediate server which is coupled to said plurality of 
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3 \ generators. 

1 8.\ The content classification system of claim 6 wherein said 

2 intermediate server is a web server. 

1 9. Th\e content classification system of claim t wherein said 

2 characteristic comprises junk e-mail and said characteristic is defined by 

3 a frequency\pf appearance of a digital ID. 

1 1 0. A methocl for identifying a characteristic of a data file, comprising: 

2 generatingya digital identifier for the data file and forwarding the 

3 identifier to a processing system; 

4 determining Vhether the forwarded identifier matches a 

5 characteristic of other identifiers; and 

6 processing the em^il based on said step of determining. 

1 11. The method of claim 1 CKwherein said step of generating comprises 

2 hashing at least a portion of theSdata file. 

1 12. The method of claim 11 whetein said step of hashing comprises 

2 using the MD5 hash. \ 

1 13. The method of claim 1 1 wherein said step of generating comprises 

2 hashing multiple portions of the data file. \ 

1 1 4. The method of claim 1 0 wherein said data filers an email message 

2 and said step of determining comprises determining whether said email is 

3 spam. \ 

1 1 5. The method of claim 1 0 wherein said step of determining rcjentifies 
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2 s^id e-mail as spam by tracking the rate per unit time a digital ID is 

3 generated. 

1 16. \The\nethod of claim 1 0 wherein said step of generating comprises 

2 generatin^D\at a plurality of source systems all coupled via a network 

3 to at least oF^processing system performing the determining step. 

1 1 7. The method offclaim 1 6 wherein said step of processing comprises 

2 instructing said pluralityspf source systems to perform an action with the 

3 email based on said determining step. 

1 18. A method of filtering amemail message, comprising: 

2 processing the message to provide a digital identifier; 

3 comparing the digital identifier to a characteristic database of digital 

4 identifiers to determine whether the rhessage has said characteristic; and 

5 processing the message based^on said step of comparing. 

1 1 9. The method of claim 1 8 wherein sartsl step of processing occurs on 

2 at least one first system, and said step of cornparing occurs on a second 

3 system. \ 

1 20. The method of claim 1 9 wherein said step orprocessing occurs on 

2 a plurality of first systems. \ 

1 21 . The method of claim 1 9 wherein said at least one f\st system and 

2 second system are coupled by the Internet. \ 

1 22. The method of claim 1 8 wherein said step of comparing cornprises 

2 determining the frequency of a particular ID occurring in a time period, 

3 classifying said ID as having a characteristic, and comparing digital 
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4 identifiers to said classified IDs. 

1 23. \ A file content classification system, comprising: 

2 kfirst system having a file to be classified; 

3 an\ile ID generator on the fist system; 

4 a database on a second system coupled to the ID generator to 

5 receive IDs generated by the ID generator; 

6 a comparison routine on the second system classifying the ID 

7 relative to the database as meeting or not meeting a characteristic. 

1 24. The system OT\daim 23 including a plurality of first systems each 

2 including a respectiveVile ID generator coupled to the database on the 

3 second system. \ 

1 25. The system of clainv24 wherein the plurality of first systems is 

2 coupled to the second system- yi a the Internet. 

1 26. The system of claim 25 wherein the second system comprises a 

2 web server interface system and a database system, wherein the database 

3 system is isolated from the Internet by (fee web server system. 

1 27. A content classification system for k first and second computer 

2 coupled by a network, comprising: \ 

3 a client agent file identifier generator on the first computer; and 

4 a server comparison agent and data-stru&ure on the second 

5 computer receiving identifiers from the client agent and providing replies 

6 to the client agent; \ 

7 wherein the client agent processes the file based on regies from the 

8 server comparison agent. 
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28. A method for providing a service on the Internet, comprising: 
collecting data from a plurality of systems having a client agent on 

the liH(ernet to a server having a database; 

laracterizing the data received relative to information collected in 
the database; and 

transmitting a content identifier to the client agent. 

29. The metnod of claim 28 wherein said step of collecting comprises 
collecting a digitaNjdentifier for a data file. 

30. The method of claim 28 wherein said data file is an e-mail. 



31. The method of clain^ 29 wherein said step of characterizing 1 
comprises: 

tracking the frequency of the collection of a particular identifier; 
characterizing the data file based on said frequency; 
storing the characterization; anc 

comparing collected identifiers to tfre known characterization. 
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